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ABSTRACT 


This  paper  is  motivated  by  an  assumption  that  many 
problems  dealing  with  arbitrarily  related  data  can  be  expedited 
on  a  digital  computer  by  a  storage  structure  which  allows  rapid 
execution  of  operations  within  and  between  sets  of  datum  names. 
Such  a  structure  should  allow  any  set-theoretic  operation  with¬ 
out  restricting  the  type  of  sets  involved,  thus  allowing  opera¬ 
tions  on  sets  of  sets  of...;  sets  of  ordered  pairs,  ordered 
triples,  ordered...;  sets  of  variable  length  n-tuples,  n-tuples 
of  arbitrary  sets;  etc.,  with  the  assurance  that  these  operations 
will  be  executed  rapidly.  The  purpose  of  a  Set-Theoretic  Data 
Structure  (STDS)  is  to  provide  a  storage  representation  for 
arbitrarily  related  data  allowing  quick  access,  minimal  storage, 
and  extreme  flexibility.  This  paper  will  describe  an  STDS  with 
the  above  properties  utilizing  a  general  implementation  suitable 
for  paging  in  a  mass  memory  system. 
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I.  INTRODUCTION 


The  overall  goal,  of  which  this  paper  is  a  part,  is 
the  development  of  a  machine- independent  data  structure  allow¬ 
ing  rapid  processing  of  data  related  by  arbitrary  assignment 
such  as:  the  contents  of  a  telephone  book,  library  files, 
census  reports,  family  lineage,  graphic  displays,  information 
retrieval  systems,  networks,  etc.  Data  which  are  non- intrinsi¬ 
cally  related  have  to  be  expressed  (stored)  in  such  a  way  as 
to  define  the  way  in  which  they  are  related  before  any  data 
structure  is  applicable.  Since  any  relation  can  be  expressed 
in  set  theory  as  a  set  of  ordered  pairs  and  since  set  theory 
provides  a  wealth  of  operations  for  dealing  with  relations,  a 
set-theoretic  data  structure  appears  worth  investigation. 

A  Set-Theoretic  Data  Structure  (STDS)  is  a  storage 
representation  of  sets  and  set  operations  such  that:  given 
any  family  of  sets  n  and  any  collection  S  of  set  operations 
an  STDS  is  any  storage  representation  which  is  isomorphic  to 
n  with  S  .  The  language  used  with  an  STDS  may  contain  any 
set-theoretic  expression  capable  of  construction  from  n  and 
S  Every  stored  representation  of  a  set  must  preserve  all 
the  properties  of  that  set  and  every  representation  of  a  partic¬ 
ular  set  must  behave  identically  under  set  operations. 
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II.  GENERAL  STORAGE  REPRESENTATION 


An  STDS  is  conprised  of  five  structurally  independent 

parts : 

1.  a  collection  of  set  operations  S  . 

2.  a  set  of  datum  names  6  . 

3.  the  data:  a  collection  of  datum  definitions,  one 
for  each  datum  name. 

4.  a  collection  of  set  names  n  • 

5.  a  collection  of  set  representations,  each  with  a 
name  in  n  . 

The  storage  representation  is  shown  schematically  in  Figure  1. 
In  order  for  an  STDS  to  be  practical  the  set  operations  must 
be  executed  rapidly.  If  any  two  sets  can  be  well-ordered  (a 
linear  order  with  a  first  element)  such  that  their  union  pre¬ 
serves  this  well-ordering,  then  the  subroutines  needed  for  set 
operations  just  involve  a  form  of  merge  or,  at  worst,  a  binary 
search  of  just  one  of  the  sets.  It  was  shown  in  another  paper 
[1]  that  any  set  defined  over  6  could  be  so  ordered.  Sets 
are  represented  by  blocks  of  contiguous  storage  locations  with 
H  containing  names  of  all  the  sets.  The  set  6  is  the  set 
of  all  datum  names,  and  is  represented  by  a  contiguous  block 
of  storage  locations;  the  address  of  a  location  in  the  6-block 
is  a  datum  name  and  an  element  of  6  •  The  content  of  a  loca¬ 
tion  in  the  6-block  is  the  address  of  a  stored  description  of 
that  datum  (see  Figure  1).  The  contents  of  the  6-block  and 


-2- 


-3- 


the  n-block  are  the  only  pointers  needed  for  the  operation  of 
an  STDS.  The  storage  representations  of  the  individual  sets  do 
not  contain  pointers  to  other  sets,  but  contain  information 
about  datum  names.  Since  each  set  representation  has  only  one 
pointer  associated  with  it,  the  set  representation  can  be  moved 
throughout  storage  without  affecting  its  contents  or  the  contents 
of  any  other  set  representation  —  only  the  one  pointer  in  T) 
is  affected.  Updating  set  representations  is  virtually  trivial. 
Elements  to  be  deleted  are  replaced  by  the  last  element  in  the 
set.  Elements  to  be  added  are  added  to  the  end  of  the  set  re¬ 
presentation  as  space  allows.  When  contiguous  locations  are  no 
longer  available  a  new  set  is  formed  and  the  element  in  n  that 
referenced  the  set  before  it  was  extended  now  references  a  loca¬ 
tion  that  indicates  that  the  set  is  now  the  union  of  two  set 
representations.  (In  a  paging  structure  such  sets  could  be 
kept  on  the  same  page.)  This  demonstrates  two  different  kinds 
of  sets  in  n  :  generator  sets  and  composite  sets.  Only  the 
generator  sets  have  storage  representations,  the  composite  sets 
are  unions  of  generator  sets,  and  the  generator  sets  are  mutual¬ 
ly  disjoint.  Since  no  duplication  of  storage  of  sets  is  neces¬ 
sary  and  since  the  set  representations  are  kept  to  a  minimum 
by  containing  just  the  elements  of  the  sets  and  no  pointers, 
an  STDS  is  intrinsically  a  minimal  storage  representation  for 
arbitrarily  related  data. 


III.  OPERATION  OF  AN  STDS 


An  STDS  relies  on  set  operations  to  do  the  work 
usually  allocated  to  pointers  or  hash-coding  as  in  list 
structures,  ring  structures,  associative  structures,  and  re¬ 
lational  files.  A  set  operation  of  S  is  represented  by  a 
subroutine  which  accesses  sets  through  pointers  in  n.  Again 
it  should  be  stressed  that  no  pointers  exist  between  sets, 
hence  the  set  operations  S  act  as  the  only  structural  ties 
between  sets.  Since  S  will  allow  any  set-theoretic  operation, 
S  will  be  rich  enough  that  all  information  between  sets  may 
be  expressed  by  a  set- theoretic  expression  generated  from  the 
operation  of  S  .  Any  expression  establishes  which  sets  are 
to  be  accessed  and  which  operations  are  to  be  performed  within 
and  between  these  sets;  therefore  all  pages  needed  for  comple¬ 
tion  of  an  expression  are  known  before  the  expression  is  exe¬ 
cuted.  Complementing  the  set  operation  subroutines  are  some 
strictly  storage  manipulation  subroutines.  These,  however, 
are  not  reflected  in  any  set-theoretic  expression.  These 
routines  change  storage  modes  and  perform  sorts  and  orderings. 

A  fast  sort  routine  has  been  programmed  with  execution  times 
as  a  linoar  function  of  the  number  of  words  to  be  sorted.  (On 
an  IBM  7090  this  sort  ordered  1000  words  in  0.35  seconds  and 
10,000  words  in  3.3  seconds.  The  nature  of  this  sort  is  such 
that  on  an  IBM  360/67  it  may  sort  up  to  60,000  bytes  per 

second.  This  routine  is  presently  being  programmed.  Another 
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Figure  1.  Storage  Schema  of  an  STDS. 
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subroutine  which  is  crucial  to  the  operation  of  an  STDS  is 
the  tau-ordering  routine  [1].  This  routine  gives  a  well¬ 
ordering  which  is  preserved  under  union. 


4 


IV.  DETAILS  OF  B-BLOCK 


The  3-block  may  be  a  section  of  contiguous*  storage 

locations  with  3  as  the  address  of  the  head  location.  The 

o 

first  location  containing  a  datum-pointer  has  the  address 

3  +1  ,  and  the  location  of  the  i-th  datum-pointer  is  3  +i  . 
o  o 

Let  #3  represent  the  total  number  of  datum-pointers,  then  the 

last  address  of  the  3-block  would  be  3  B  .  3  is  the  set 

o 

of  datum-names  or  locations  of  datum-pointers  in  the  3-block. 

Since  all  datum-pointers  are  located  between  3  *1  and  3  +# B  , 

oo 

let  3  be  the  set  of  integers  {1,2,.  ..,#3)  .  Therefore  any 
integer  i  such  that  l<i<#3  is  the  datum-name  for  the 
i-th  datum-pointer.  The  i-th  datum- pointer  locates  a  block  of 
storage  containing  a  description  of  the  i-th  datum  and  all  the 
generator  set  names  (elements  of  n)  for  which  the  i-th  datum 
name  is  a  constituent,  (see  Figure  1). 


*  The  3-block  may  also  be  represented  by  n  disjoint  con 
tiguous  3^-blocks such  that  3  ■  3tU  32U  ...  U  Sn  • 
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V.  DETAILS  OF  n-BLOCK 


The  n-block  is  similar  to  the  $-block  with  n 

o 

and  #n  as  the  address  of  the  head  location  and  cardinality 
respectively.  The  contents  of  the  n-block  are  pointers.  These 
pointers  are  of  two  types  and  are  distinguished  by  an  integer 
n*  such  that  1  <  n*<#n  •  For  all  l^i<n*  ,  i  is  the 
name  of  a  generator  set,  and  for  all  n*<i<#n  ,  i  is  a  com¬ 
posite  set.  A  generator  set  has  a  set  representation  while  a 
composite  set  does  not  since  it  is  the  union  of  some  generator 
sets.  For  i>n*  the  pointer  in  nQ+i  locates  a  section  of 
storage  containing  names  of  generator  sets.  For  i <  n*  the 
pointer  in  nQ+i  locates  a  section  of  storage  containing  all 
composite  set  names  that  use  i  ,  and  a  pointer  to  the  set  re¬ 
presentation  of  i  .  Since  all  generator  sets  are  mutually 
disjoint  and  since  only  generator  sets  have  a  storage  represen¬ 
tation,  there  is  no  duplication  of  storage  in  an  STDS.  Let  the 
class  of  generator  sets  be  G  and  the  class  of  composition 
sets  be  C  ,  then  G»  {l,...,  n*-l),  C  ■  (n*»...*#n)  ,  and 
n«  G  u  C  (see  Figure  1). 


VI.  SET  REPRESENTATION 


In  order  to  insure  fast  execution  times  for  the 
set  operations  in  S  ,  the  sets  involved  must  be  isomorphic 
to  a  unique  linear  representation  of  their  elements.  Unique 
is  used  here  to  mean  unique  relative  to  some  predefined  well¬ 
ordering  relation,  such  that  independently  of  how  the  set  is 
presented  to  a  machine  the  ordering  of  its  elements  will 
always  be  the  same.  This  well-ordering  must  be  preserved 
under  union.  Any  ordering  satisfying  the  above  conditions 
is  adequate  for  the  efficient  operation  of  an  STDS  [1]. 

Since  the  set  representatives  must  be  isomorphic 
to  the  sets  they  represent,  every  set  representation  must 
reflect  the  rank  and  preserve  the  order  (if  any)  of  the  sets 
and  their  elements.  Let  A  ■  <a,b,c>,  B  =  {a,b,c}  ,  and 
C  *  (c,b,a}j  then  B  and  C  must  have  the  same  set  represen¬ 
tation  while  A  must  have  a  completely  different  representa¬ 
tion.  For  simple  sets  like  these,  adequate  representations 
are  trivial}  such  is  not  always  the  case,  however. 
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VII.  COMPLEXES  AND  N-TUPLES 


If  an  STDS  is  to  be  general, then  it  will  have  to  ac¬ 
commodate  more  imaginative  sets  than  the  ones  above.  Let 
W  ■  {a,b , { { c } } ,<a, {b ,d} , c> , <<a,b> , c>}  and  V  »  {<a,b,c>, 

<{<a,b> , <c , d>} ,<d, a, >> , { {c} } ,b }  .  In  order  for  set  operations 
on  these  sets  to  fall  within  the  allotted  time  bounds,  the 
storage  representations  of  W  and  V  must  satisfy  the  well¬ 
ordering  conditions.  Such  a  representation  is  not  immediately 
obvious.  Two  problems  arise. 

1.  The  first  problem  is  machine-oriented  in  that  an 
ordered  set  in  set  theory  is  defined  through  nesting  and  re¬ 
petition  of  the  elements  of  the  set.  For  example, the  Kura- 
towski  definition  of  ordered  pair  gives  <a,b>  *=  {{a),{a,b}}  . 
Since  any  machine  representation  will  induce  an  order  on  the 
elements  of  a  set  by  their  location  in  storage,  this  may  be 
utilized  instead  of  relying  on  redundancy  of  storage.  This 

in  turn  may  present  problems  in  preserving  the  isomorphism 
between  sets  and  their  set  representations,  since  an  unordered 
set  must  have  a  unique  representation  and  no  ordering  on  its 
elements . 

2.  The  second  problem  is  much  allied  with  the  first 
except  that  it  is  more  biased  towards  the  foundations  of  set 
theory.  There  seems  to  be  a  general  lack  of  precision  in  set 
theory  when  ordering  beyond  a  pair  is  involved.  No  set  re¬ 
presentation  of  ordered  triples,  ordered  quadruples,  quintuples. 
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sextuples,  etc.  is  given  save  for  an  arbitrary  assignment  in 
terms  of  ordered  pairs.  (This  problem  is  discussed  by  Skolem 
[3].)  For  example  <a,b,c,d>  has  no  set  equivalent  indepen¬ 
dent  of  ordered  pairs;  it  is  given  one  of  the  following  as 
its  canonical  form:  <<a ,b> , <c , d>> ;  <a,<b,<c,d>>>;<a,<<b,c>,d>; 
<<<a,b>,c>,d>;  <<a,<b , c>> , d>;  or  {<1 ,a>,<2,b>,<3,c>,<4,d>)  . 

Clearly  each  of  these  sets  has  independent  stature,  and  assigning 
one  as  a  canonical  form  of  the  other  precludes  the  use  of  the 
others.  The  problem  with  ordered  tuples  is  compounded  in  that 
though  they  are  defined  as  sets  they  are  excluded  from  meaning¬ 
ful  set  operations.  The  intersection  between  quadruples 
<a,b(c,d>  and  <x,b,c,d>  is  always  empty  unless  a«x  ,  and 
even  then  it  depends  on  which  assignment  is  used.  In  another 
paper  [1]  the  definition  of  a  ’complex'  is  presented  which 
preserves  the  distinction  between  different  nestings  of  ordered 
pairs,  does  not  require  order  to  be  defined  by  repetition,  and 
does  not  arbitrarily  exclude  certain  sets  from  being  operated 
on  by  set  operations.  The  formal  definition  of  a  complex  is 
given  by  the  following,  where  N  is  the  set  of  natural  numbers. 

DEFINITION  OF  A  COMPLEX:  Any  two  sets  A  and  B 
form  a  complex  (A;B)  if  and  only  if 
(IX)  (IY)  (Xe{A,  B})  (Y«{A,B>)  [(Vxa  X)  (3i«N) 

( { { x } , i }aY)  &  (VyaY) (ijeN) (IxeX) ({{x},j}«y)] 
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This  definition  is  stated  in  such  a  way  as  not  to  presuppose 
any  ordering  in  (A;B)  of  A  before  B  ,  insuring  that  a 
complex  be  an  unordered  coupling  of  two  sets,  each  bearing 
a  mutual  dependence  on  the  other.  The  definition  states  that 
for  every  element  x  of  one  of  the  sets,  X  ,  the  other  set, 

Y  ,  contains  an  element  containing  a  natural  number  and  a  set 
whose  only  element  is  x  ;  and  that  Y  is  such  that  every 
element  of  Y  contains  only  a  natural  number  and  a  singleton 
set  containing  an  element  of  X  (either  X»A  and  Y=B  ,  or 
X«B  and  Y*A  ,  but  not  both).  Let  A*{a,b,c),  B={{{a},l), 

{{b} , 3} , { { c } ,  963}, { {b } , 6)>  and  let  C- (a ,b , { (b } , 3} , { {a} , 1 } , 
{{d},6}}  then  (A;B)  ,  (B;A)  and  (AOCjBnC)  are  complexes, 
while  (A;A)  ,  (A;C),  (A ;  B  AC)  and  (AnC;B)  are  not  complexes. 
From  the  definition  it  should  be  noticed  that  if  (A;B)  is  a 
complex  then  (B;A)  is  the  same  complex  and  Aj<B  .  Without 
giving  a  formall  definition  here  let  xe^A  be  understood  to 
mean  that  x  is  in  the  i-th  position  of  the  complex  A  ,  then 
a  notational  schema  for  a  complex  is  given  by: 

DEFINITION  SCHEMA:  {x1 :T(x,i) }=A  iff  [(Vx)(VieN) 

(xe^  «-*■  T(x,i))  &  A  is  a  complex]. 

These  results  allow  a  set-theoretic  foundation  for  the  follow¬ 
ing  equivalent  notations: 


set 

ordered  pair 


{a,b,c}  »  (a1 ,b*  ,c* } 
<a ,b>  ■  (a 1 ,b 2 } 
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ordered  triple  <a,b,c>  ■ 

ordered  quadruple  <a,b,c,d>  * 
ordered  pairs  of  ordered  pairs 

<<a ,b> , <c,d>>  * 
<a,<b ,<c,d>>>  = 
<a , <<b , c> ,d>>  « 
<<<a,b>,c>,d>  * 
<<a,<b ,c>>,d>  * 
{<l,a>,<2,b>,<3,c>,<4,d>}  « 


{a1 ,b2,cJ} 

{a1  ,b2,c*  .d1*} 

{{a1 ,b2} 1 , {c1 , d  2 } 2 } 
{a1 ,{b*  , {c1 ,d2}2}2} 
{a1 , { {b 1 , c2 } 1 , d2 } 2  } 
{{{a1 ,b2}1 ,c2}1 ,d2} 
{ {a1 , {b 1 , c2 } 2} 1 , d2 } 
{{ll,a2,},{2l,b2}  , 
{31,c2};{41,d2}} 


and  from  the  beginning  of  this  section, 


W  =  {a1,b1,{{c1}},{a1,{bl,d1}2,c,},{{a1,b2},c1}} 

V  =  {{a1,b2,c,},{{{al,b2},{c1,d2}},{dl,a2}2},{{c1}},b1} 


Since  for  all  a^a1}*^}  ,  the  exponent  1  is  optional.  It 
should  be  stressed  that  the  symbol  'x1'  has  no  meaning  apart 
from  being  enclosed  by  set  brackets.  If  A»{a6,b8}  ,  then 
aegA  and  begA  are  true,  but  a*eA  is  meaningless.  For 
examples  of  set  operations  between  complexes  see  Figure  2. 
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1.  <a,b,ofl  <x,b,y>  *  {b2} 

2.  <a,b , c> U  <x,y>  *  {a1 , x1 #b 2 , y2 , c * } 

3.  {a,b,c}fi  <a,x,y>  =  <a>  *  {a1}  =  {a} 

4.  U{a1.b2,{x1,c3}3<{y2,d4}4}  *  {x1  ,c3}U  {y^d1*}  =  <x,y,c,d> 

5.  <a,b , z> a  <a,y , c>  A <x,b , c>  ■  <x,y,z> 

6.  <a,b,c,d>  ^  <x,y,c,d>  ■  <a,b> 


Figure  2.  Set  Operations  between  Complexes. 


VIII.  SET  OPERATION  SUBROUTINES 


The  viability  of  an  STDS  rests  not  only  on  the  speed 
of  the  set  operations,  but  also  on  their  scope.  Table  I 
presents  some  available  set  operations  for  constructing  ques¬ 
tions  in  any  way  compatible  within  a  parent  language.  (For 
those  who  are  not  familiar  with  the  set- theoretic  definitions 
or  are  not  accustomed  to  the  notation  preferred  in  this  mono¬ 
graph,  the  definitions  are  given  in  the  Appendix.)  These  sub¬ 
routines  are  presented  in  a  format  compatible  with  FORTRAN, 
and  with  MAD  if  periods  are  added  as  in  the  examples  to  follow. 
The  argument  represented  by  C  in  the  subroutines  can  be  de¬ 
leted.  This  default  case  assigns  a  temporary  storage  block 
whose  location  is  returned  in  D  ,  as  if  it  were  a  permanent 
storage  location,  i.e.,  D  «  UN(A,B)  .  Since  all  subroutines 
operate  on  the  name  of  a  storage  block  representing  a  set,  then 
for  all  subroutines  that  return  a  name,  any  degree  of  nesting 

A 

of  these  subroutines  within  subroutines  is  allowable  (see 
examples).  Since  the  only  restriction  on  a  set  representation 
is  that  it  be  isomorphic  to  the  set  and  have  a  predefined  well¬ 
ordering  on  its  elements,  there  are  many  storage  configurations 
available.  MODE  allows  a  choice  of  different  storage  configu¬ 
rations  for  non-set-theoretic  needs.  Though  all  the  subroutines 
appear  to  be  defined  just  for  sets,  they  are  defined  for  any 
complex  as  well.  However,  to  make  use  of  complexes  that  are 
not  sets  since  they  allow  the  extension  of  binary  relation 
properties  (e.g.,  domain,  image,  relative  product,  restriction, 

etc.)  to  sets  of  arbitrary-length  n-tuples,  further  delimiters 
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Bust  be  included. 

For  example  using 

•Q' 

and 

an  extra 

argument  the  I-th 

relative  product  of 

A 

with 

B  could  be 

QRP (I , A, B,C)  ,  and 

the  I-th  domain  of 

A 

could 

be  QDM(I ,A,C)  , 

and  QELM(I ( A, B) 

could  represent  the 

question 

"is  A  an 

I-th  element  of  B 

tl 

• 

TABLE  I 


SOME  SET  OPERATIONS  EXPRESSED  AS  SUBROUTINES 

The  last  column  contains  an  executable  expression 
of  the  set-theoretic  expression  preceding  it.  D  is  an  indirect 
name  for  the  permanent  storage  with  name  C  ,  or  for  temporary 
storage  if  the  argument  C  is  deleted  (see  text). 


1) 

UNION 

C 

■ 

A  OB 

D 

B 

UN (A, B ,C) 

C 

8 

Ua 

D 

B 

UN(1,A,C) 

2) 

INTERSECTION 

C 

B 

An  b 

D 

B 

IN (A, B , C) 

C 

B 

Ha 

D 

B 

IN(1,A,C) 

3) 

SYMMETRIC  DIFFERENCE 

C 

B 

A  A  B 

D 

B 

SD (A, B , C) 

C 

B 

AA 

D 

8 

SD(1,A,C) 

4) 

RELATIVE  COMPLEMENT 

C 

B 

A'V'B 

D 

8 

RL(A,B,C) 

5) 

EXACTLY  N  elements  of 

A 

C 

B 

E  A 
n 

D 

8 

EX (N , A, C) 

6) 

DOMAIN  of  A 

C 

B 

D(A) 

D 

8 

DM(A,C) 

7) 

RANGE  of  A 

C 

B 

R(A) 

D 

B 

RG(A,C) 

8) 

IMAGE  of  B  under  A 

C 

B 

A  [B] 

D 

8 

IM (A, B, C) 

9) 

CONVERSE  IMAGE  under  A 

C 

B 

[B]  A 

D 

B 

CM  (A ,  B ,  C) 

10) 

CONVERSE  of  A 

C 

B 

A 

D 

8 

CV(A,C) 

11) 

RESTRICTION  of  A  to  B 

C 

B 

A  |  B 

D 

8 

RS (A , B , C) 

12) 

RELATIVE  PRODUCT  of  A 

and 

B 

C 

B 

A/B 

D 

8 

RP (A , B , C) 

13) 

CARTESIAN  PRODUCT  of  A 

and 

B 

C 

B 

A*B 

D 

8 

XP (A, B,C) 

14) 

DOMAIN  CONCURRENCE  of  A 

to 

B 

C 

B 

2)(A:B) 

D 

8 

DC  (A,  B ,  C) 

15) 

RANGE  CONCURRENCE  of  A 

to 

B 

C 

B 

#CA:B) 

D 

8 

RC(A,B,C) 

16) 

SET  CONCURRENCE  of  A  to  B 

C 

B 

6”(A :  B) 

D 

B 

SC (A , B , C) 

17) 

CARDINALITY  of  A  N»#A, 

(N 

is  an 

integer) 

N 

a 

C(A) 
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TABLE  I  (cont'd) 


BOOLEAN  OPERATIONS  1=1  if  the  statement  is  true. 

1=0  if  the  statement  is  false. 


18) 

A 

is  a  subset  of  B 

I  = 

SBS (A, B) 

19) 

A 

is  equal  to  B 

I  « 

EQL(A,B) 

20) 

A 

and  B  are  disjoint 

I  = 

DSJ (A, B) 

21) 

A 

is  equipollent  to  B 

I  « 

EQP (A, B) 

22) 

A 

is  an  element  of  B 

I  « 

ELM (A , B) 

SPECIAL  CONTROL  OPERATIONS 

23)  SET  CONSTRUCTION  C  =  {A,B,X,...}  D  =  S(C,A,B,X,.. 

24)  MODE  of  A  (see  text)  N  is  an  integer  N  =  M(A) 

25)  ACCESS  DATA  in  A  by  format  N  D  =  ACC(N,A,C) 

(each  format  is  written  in  the  parent 
language  and  given  an  integer  name,  N  ) 
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IX.  SOME  APPLICATIONS 


This  section  will  be  devoted  to  examples  de¬ 
monstrating  the  applicability  of  set-theoretic  questions. 
For  a  germane  reference  on  computer  graphics  see  Johnson 
[2].  The  first  two  examples  are  to  give  some  indication  of 
execution  times.  The  two  examples  were  run  on  an  IBM  7090; 
the  times  may  or  may  not  be  characteristic  of  the  poten¬ 
tial  speeds  in  an  STDS.  With  just  two  examples  no  claims 
can  be  made  other  than  that  two  examples  were  run  with  the 
following  results: 

EXAMPLE  1:  Given  a  population  of  24,000  people  and 
a  file  F  containing  a  ten-tuple  for  each  person  such 
that  each  ten-tuple  is  of  the  form  <  age,  sex,  marital 
status,  race,  political  affiliation,  mother  tongue, 
employment  status,  family  size,  highest  school  grade 
completed,  type  of  dwelling  >,  the  following  four 
questions  were  asked: 

a.  Find  the  number  of  married  females: 

Answer:  6,015  Time:  0.50  seconds 

b.  Find  the  number  of  people  of  Spanish  race  whose 
mother  tongue  is  not  Spanish. 

Answer:  1,352  Time:  0.48  seconds 

c.  Find  the  number  of  people  aged  93  or  94. 

Answer:  46  Time:  0.73  seconds 
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d.  Find  the  number  of  males  and  unmarried  females. 

Answer:  17,985  Time:  0.55  seconds 

e«  Find  the  number  of  males  between  the  ages  of  20 
and  40. 

Answer:  588  Time:  0.62  seconds. 

EXAMPLE  2:  Given  a  population  of  3000  people  and 
given  two  collections,  A  and  B,  of  subsets  from  this 
population  such  that:  A  contains  20  sets  of  500 
people,  and  B  contains  500  sets  of  20  people.  Find 
the  set  of  people  belonging  to  some  set  in  A  ,  to  all 
sets  in  A  ,  and  to  an  odd  number  of  sets  in  A  ; 
and  similarly  for  B  . 


Results 

A- Times 

B- Times 

a. 

people 

in  some  set 

0.73  sec 

0.76  sec 

b. 

people 

in  all  sets 

0.48  sec 

0.05  sec 

c. 

people 

in  odd  no.  of  sets 

0.76  sec 

0.78  sec 

A  point  to  notice  is  that  where  every  element  has  to  be 
accessed,  as  in  (a)  and  (c),  the  times  are  dependent  on 
the  total  number  of  elements  included  (5(A)  *  5(B)  »  10,000) 
and  not  the  number  of  sets  involved  (20  for  A  and  500  for  B) . 

Examples  three  and  four  are  presented  with  MAD  as 
the  parent  language,  therefore  all  the  subroutines  names 
must  end  with  a  period. 
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EXAMPLE  3:  Let  six  sets  A,B,C,D,E,  and  F  be  the 
membership  lists  of  six  country  clubs.  For  each  male 
resident  of  Ann  Arbor,  let  there  be  a  datum  name  in  $ 
for  a  data  block  containing:  person's  name,  address, 
phone  number,  credit  rating,  age,  golf  handicap,  wife's 
name  (if  any),  political  affiliation,  religious  pre¬ 
ference,  and  salary.  The  set  n  will  contain  the  names 
of  the  sets,  namely:  A(0),  B(0),  C(0),  D(0),  E(0), 

F(0)  .  This  along  with  the  collection  S  of  set 
operations  allows  answering  the  following  questions. 

1)  How  many  members  belong  to  club  A  or  B  but  not 
C  ? 

2)  Find  the  phone  numbers  of  members  in  an  odd 
number  of  clubs . 

3)  Get  addresses  of  members  belonging  to  one  and 
only  one  club. 

4)  Get  addresses  and  phone  numbers  of  people  not 
in  any  club. 

5)  Find  members  of  A  that  are  not  also  in  B  but 
who  may  be  in  C  only  if  they  are  not  in  D  ,  or 
in  E  if  they  are  not  in  F  . 

6)  Get  the  average  credit  rating  of  members  belonging 


to  exactly  three  clubs. 
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The  possible  questions  may  become  ridiculously  in¬ 
volved  and  may  interact  with  any  spontaneously  constructed 
sets.  For  example  of  the  latter,  let  X  be  the  set  of 
Ann  Arbor  males  born  in  Ann  Arbor. 

7)  Find  the  average  age  of  members  born  in  Ann  Arbor 
and  compare  with  average  age  of  members  not  born 
in  Ann  Arbor. 


The 

answers 

to 

(1) 

through 

(7)  formulated  in  an 

STDS  are 

expressed  bel 

ow , 

with 

N  and 

M  representing  real 

numbers , 

and 

with  BB 

for 

6 

and  NN 

for  n  . 

1) 

N  » 

C. (RL. (UN 

.(A, 

B),C) 

) 

ans  : 

N 

2) 

ACC. 

(1 , SD .  (1 , 

NN), 

Q) 

ans  : 

Q  Format  1 

gives 

phone 

numbers  (see  Table  I, 

#25) 

3) 

ACC. 

(2, EX.  (1, 

NN), 

Q) 

ans  : 

Q  Format  2 

gives 

addr es 

ses 

4) 

ACC. 

( 3 , RL . (BB 

,  UN. 

(1,NN)),Q) 

ans.  Q  Format  3  gives  phone  numbers  and  addresses 

5)  RL. (RL. (A,B),UN.(RL. (D,C),RL.  (F,E)),Q) 

ans :  Q 

6)  ACC. (4, EX. (3,NN) ,Q) 


N  *  0 
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LOOP 

ans : 
7) 


LOOP1 

LOOP2 

ans  : 
Format 

EXAMPLE  4: 
With  just 
tion  U  , 
expressed . 

Let  U  be 
M  ■  {<x,y> 
F  «  {<x,y> 
S  =  {<x,y> 
B  =  { <x ,y> 


THROUGH  LOOP,  FOR  I  «  1 , 1 , 1  . G  .  C . (Q) 

N  -  N  ♦  Q  ( I ) 

N  «  N/C. (Q) 

N  Format  4  gives  credit  rating 
N  *  0 
M  =  0 

ACC.  (5.X.T) 

THROUGH  L00P1 ,  FOR  I  ■  1 , 1 , 1 . G . C . (T) 

N  *  N  +  T(I ) 

ACC.  (5 , RL . (BB.X) , P) 

THROUGH  LOOP2 ,  FOR  I  -  1 , 1 , 1 . G . C . (P) 

M  «  M  ♦  P ( I) 

N  *  N/C. (T) 

M  *  M/C. (P) 

N  and  M  are  the  respective  average  ages 
5  gives  ages 

Family  lineage  is  easily  expressed  in  an  STDS, 
five  initial  relations  defined  over  a  popula- 
all  questions  concerning  family  ties  may  be 


a  population  of  people  and  let 
y  is  the  mother  of  x) 
y  is  the  father  of  x) 
y  is  a  sister  of  x) 
y  is  a  brother  of  x) 
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H  ■  {<x,y>:  y  is  a  husband  of  x} 

Let  X  be  any  subset  of  the  population  U  ,  find 

1)  the  set  G  of  grandfathers  of  X  . 


G  -  FfflFUM)  [X]] 

set  notation 

IM. (fylM  .  (UN. (F,M) ,X) , G ) 

in  an  STDS 

2) 

the  set  GF  of  grandfathers  of 

X 

on  the  father's 

s  ide . 

GF  «  F  [  F  [X]  ] 

set  notation 

IM. (F, IM. (F,X) ,GF) 

STDS 

3) 

the  set  GM  of  grandfathers  of 

X 

on  the  mother's 

side 

GM  «  G  *\»  GF 

set  notation 

RL.  (G ,GF,GM) 

STDS 

4) 

the  set  GR  :  the  grandfather  relation  over  U  . 

GR  «  (FUM)/F 

set  notation 

RP. (UN.  (F,M) ,F,GR) 

STDS 

5) 

the  general  relation:  P  =  {<x,y>: 

y  is  a  parent  of 

x} 

P  *  F  U  M 

set  notation 

UN. (F,M , P) 

STDS 

6) 

the  genral  relation:  Sibling,  L. 

L  -  SUB 

set  notation 

UN.  (S,B,L) 

STDS 

7) 

the  general  relation:  Children, 

C. 

C  «  Ftf?  =  7 

set  notation 

CV. (P,C) 


STDS 
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8)  the  general  relation:  Aunt,  A  . 

A  =  (P/S)  y  (P/ B/H*) 

UN. (RP. (P,S),RP. (P,RP. (B,CV. (H))),A) 

9)  the  general  relation:  Wife,  W  . 

W  *  H 

CV. (H , W) 


set  notation 
STDS 


set  notation 
STDS 


10)  the  general  relation:  Cousin,  K  . 

K  =  P/L/C  set  notation 

RP. ( P , RP . (L,C),K)  STDS 

11)  the  general  relation:  Half-sibling,  HS  . 

HS  »  P/C  ^  (M/M  P/F)  set  notation 

RL. (RP. (CV. (C) ,C) , IN. (RP. (M,CV. (M)) , 

RP . (F, CV. (F) ) ) ,HS  STDS 

12)  people  in  X  with  no  brothers  or  sisters 


Q  -  X  ^  D(L) 

RL.  (X , DM . ( L) , Q) 

13)  find  all  relations  of  X  to  a  set  1 
is  equal  to  the  image  of  X  . 

Q  =  (A: (Aen) (Y  »  A [X] > 

DC. (X ,NN ,T) 

THROUGH  LOOP,  FOR  I  »  1 , 1 , I . G . C . (T) 

B  »  IM. (T ( I ) , X) 

LOOP  WHENEVER  EQL . (Y , B) . E . 1 ,  UN . (Q, S . (T ( I ) ) ,Q) 


set  notation 
STDS 

such  that  Y 


set  notation 
STDS 


Many  more  possibilities  are  available  and  might 
be  tried  by  the  reader. 
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X.  CONCLUSION 

The  purpose  of  an  STDS  is  to  provide  a  storage 
representation  for  arbitrarily  related  data  allowing  quick 
access,  minimal  storage,  generality,  and  extreme  flexibility. 
With  the  definition  of  a  complex,  a  predefined  well-or¬ 
dering,  and  the  operations  of  set  theory,  such  a  storage 
representation  can  be  realized. 


APPENDIX 


SET-THEORETIC  DEFINITIONS 

Conventions 

The  logical  connectives  'and',  'or',  ' exclusive-or' 
are  represented  by  'a*#  'v',  ' •  'For  all  x',  'for  some  x', 

'for  exactly  n  x'  will  be  represented  by  'Vx',  '3x',  'E(n)lx' 
Parentheses  are  used  for  separation,  and  as  usual  the  concatena 
tion  of  parentheses  will  represent  conjunction. 

'A'  will  be  a  set  if  and  only  if 

a.  it  can  be  represented  formally  by  abstraction 
(i.e.,  A«{x:0(x)}  whe*~  0(x)  is  a  predicate  condition  speci¬ 
fying  the  allowable  elements  *x'); 

b.  'A'  can  be  represented  by  {,}  enclosing  the 
specific  elements  of  'A'. 

Definitions 

The  symbol  'e'  means  'is  an  element  of';  x«A 
reads:  "x  is  an  element  of  A". 

1.  UNION 

a.  binary  union  of  two  sets  A  and  B 

A  ufi  =  (x : (xeA) v(xeB) } 

b.  unary  union  of  a  family  G  of  sets 

\JG  =  {x:  OAeG)  (xeA)} 
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c.  indexed  union  of  a  set  f(A)  over  the  family  G 
UA«Gf(A)  '  ^ x  '•  3  AeG)  (xef  (A) )  }  . 

2.  INTERSECTION 

a.  binary  intersection  of  A  and  B 

AflB  ■  { x : (xaA) (xeB) } 

b.  unary  intersection  of  a  family  G 

AG  *  { x: (VAeG) (xaA)} 

c.  indexed  intersection  of  f(A)  over  the  family  G 

<\#Gf(A)  a  {x:  (VAeG)  (xef  (A))}  . 

3.  SYMMETRIC  DIFFERENCE 

a.  binary  symmetric  difference  of  A  and  B 

AaB  =  (x:(xeA)*  (xeB)}* 

*  even  though  the  symbol  'a' 
has  two  different  meanings, 
no  confusion  is  likely 

b.  unary  symmetric  difference  of  G 

AG  -  {x:(for  an  odd  number  of  AeG) (xeA) } 

c.  indexed  symmetric  difference  of  f (A)  over  G 

AAeGf(A)  =  (x:(for  odd  no.  of  AeG) (xef (A) ) }  . 

4.  RELATIVE  COMPLEMENT 

A  'v  B  a  {x :  (xeA)  (x^B)  }  . 

5.  EXACTLY  N! 

the  set  of  elements  common  to  exactly  ’n* 
of  a  given  set  G  is  represented  by: 

EnG  =  (x: (E(n) !AeG) (xeA) }  . 


elements 
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6.  DOMAIN  of  a  set  A 

P(A)  «  (x : (3y) (<x,y>«A) }*  . 

*  <x,y>  represents  an  ordered  pair 

7.  RANGE  of  a  set  A 

R(A)  «  (y :  (2x)  (<x,y>eA) }  . 

8.  IMAGE  of  B  under  A 

A[B]  =  {y :  (3x«B)  (<x,y>«A)  }  . 

9.  CONVERSE  IMAGE  of  B  under  A 

[B] A  -  (x:  (3yeB)  (<x,y>eA) }  . 

10.  CONVERSE  of  A 

A  *  {<y,x>:  <x,y>  eA}  . 

11.  RESTRICTION 

A |  B  ■  {<x,y> : (<x,y>«A) (xeB)  }  . 

12.  RELATIVE  PRODUCT  of  A  and  B 

A/B  =  {<x,y>:  (3z)  (<x,  z>«A)(<z,y>*B) )  . 

13.  CARTESIAN  PRODUCT  of  A  and  B 

A*B  =  (<x,y> : (xeA) (y«B)  }  . 

14.  DOMAIN  CONCURRENCE  of  X  relative  to  A 
3D(X:A)  =  {B:  (B«A)  (xc  P(B))  }  . 

15.  RANGE  CONCURRENCE  of  X  relative  to  A 


#(X:A)  -  {B:  (BeA)  (Xc  R(B) )  >  . 
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16.  SET  CONCURRENCE  of  X  relative  to  A 

(X : A)  «  {B: (BeA) (Xc  B)  >  . 

17.  CARDINALITY  of  A 

#A  ■  n  iff  there  are  exactly  n  elements 
in  A. 

18.  A  is  a  SUBSET  of  B  iff  every  element  of  A  is  an 

element  of  Bi  Ac  B  -*-*■  (Vx)  (xeA  -*■  xeB)  . 

19.  A  is  EQUAL  to  B  iff  A  is  a  subset  of  B  ,  and 

B  is  a  subset  of  A:  A*B  (Ac  B  &  B*»A)  . 

20.  A  and  B  are  DISJOINT  iff  the  intersection  of  A 
and  B  is  empty:  An  B  =  0  . 

21.  A  is  EQUIPOLLENT  to  B  ’.ff  A  and  B  contain  the 

same  number  of  elements:  #A  *  #B  . 


GLOSSARY  OF  SYMBOLS 


Symbol 


Symbol  Definition 


iff 

A 

V 

A 


Vx 
lx 
E  !  x 
Ox 

E (n)  !  x 
e 
0 
i 
c 

An  b 

AO  B 
A  a  B 

A'v-B 

<x,y> 

{x : 9 (x)  } 
xAy 


if  and  only  if 
Identity 
Con j unction 
Disjunction 
Exclusive  or 

Implication  (if  ...  then) 

Equivalence 

Universal  quantifier  (for  all) 

Existential  quantifier  (for  some) 

Uniqueness  quantifier  (for  exactly  one) 

Odd  quantifier  (for  an  odd  number  of) 

Exact  number  quantifier 

Set  membership 

Empty  set 

Non-membership 

Set  inclusion 

Intersection 

Union 

Symmetric  difference 
Relative  complement 
Ordered  pair 

Definition  by  abstraction 

Ordered  pair  <x,y>  contained  in  A 
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GLOSSARY  OF  SYMBOLS  (cont'd) 


Symbol 

Symbol  Definition 

Ug 

Union  or  sum  of  G 

f)G 

Intersection  of  G 

AG 

Symmetric  difference  of  G 

E  G 
n 

Elements  contained  in  exactly  n 
of  G 

AxB 

Cartesian  product 

P(A) 

Domain  of  A 

R(A) 

Range  of  A 

A 

Converse  of  A 

A/B 

Relative  product  of  A  and  B 

a|x 

A  restricted  to  X 

A  [X] 

Image  of  X  under  A 

[X]  A 

Converse-image  of  X  under  A 

5d(X) 

Domain  concurrence  of  X 

3l(X) 
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GT(X) 
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C(A) 

Total  cardinality  of  A 
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